Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest

نویسندگان

  • Usman Roshan
  • Satish Chikkagoudar
  • Zhi Wei
  • Kai Wang
  • Hakon Hakonarson
چکیده

We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Prognosis of multiple sclerosis disease using data mining approaches random forest and support vector machine based on genetic algorithm

Background: Multiple sclerosis (MS) is a degenerative inflammatory disease which is most commonly diagnosed by magnetic resonance imaging (MRI). But, since the MRI device uses of a magnetic field, if there are metal objects in the patient's body, it can disrupt the health of the patient, the functioning of the MRI, and distortion in the images. Due to limitations of using MRI device, screening ...

متن کامل

Improvement of Support Vector Machine and Random Forest Algorithm in Predicting Khorramabad River Flow Uusing Non-uniform De-Noising of data and Simplex Algorithm

In this study, in order to simulate the monthly flow of the Khorramabad River, the time series of this river was decomposed into three levels using the wavelet of Daubechies-3, during the period of 1955-2014. Based on this, it was found that there is a Non-uniform noise that includes two periods of time in this signal, with the October 2008 border which required that the signal be become non-un...

متن کامل

Predicting the cause of kidney stones in patients using random forest, support vector machine and neural network

Background: Today, with the advancement of technology in various fields, the importance of recording data in the field of health is increasing so much that for many diseases around the world, including kidney disease, registration systems have been set up. This is happening in our country and in the future, the number of these systems will increase. The medical data set contains valuable inform...

متن کامل

Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis

Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2011